Parallel Prefix (Scan) Algorithms for MPI

نویسندگان

  • Peter Sanders
  • Jesper Larsson Träff
چکیده

We describe and experimentally compare three theoretically well-known algorithms for the parallel prefix (or scan, in MPI terms) operation, and give a presumably novel, doubly-pipelined implementation of the in-order binary tree parallel prefix algorithm. Bidirectional interconnects can benefit from this implementation. We present results from a 32 node AMD Cluster with Myrinet 2000 and a 72-node SX-8 parallel vector system. On both systems, we observe improvements by more than a factor two over the straight-forward binomial-tree algorithm found in many MPI implementations. We also discuss adapting the algorithms to clusters of SMP nodes.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Parallel Prefix Scan with Compute Unified Device Architecture (cuda)

Parallel prefix scan, also known as parallel prefix sum, is a building block for many parallel algorithms including polynomial evaluation, sorting and building data structures. This paper introduces prefix scan and also describes a step-bystep procedure to implement prefix scan efficiently with Compute Unified Device Architecture (CUDA). This paper starts with a basic naive algorithm and procee...

متن کامل

Offloading MPI Parallel Prefix Scan (MPI_Scan) with the NetFPGA

Parallel programs written using the standard Message Passing Interface (MPI) frequently depend upon the ability to efficiently execute collective operations. MPI_Scan is a collective operation defined in MPI that implements parallel prefix scan which is very useful primitive operation in several parallel applications. This operation can be very time consuming. In this paper, we explore the use ...

متن کامل

Parallel computing using MPI and OpenMP on self-configured platform, UMZHPC.

Parallel computing is a topic of interest for a broad scientific community since it facilitates many time-consuming algorithms in different application domains.In this paper, we introduce a novel platform for parallel computing by using MPI and OpenMP programming languages based on set of networked PCs. UMZHPC is a free Linux-based parallel computing infrastructure that has been developed to cr...

متن کامل

Parleda: a Library for Parallel Processing in Computational Geometry Applications

ParLeda is a software library that provides the basic primitives needed for parallel implementation of computational geometry applications. It can also be used in implementing a parallel application that uses geometric data structures. The parallel model that we use is based on a new heterogeneous parallel model named HBSP, which is based on BSP and is introduced here. ParLeda uses two main lib...

متن کامل

Two-tree algorithms for full bandwidth broadcast, reduction and scan

We present a new, simple algorithmic idea for the collective communication operations broadcast, reduction, and scan (prefix sums). The algorithms concurrently communicate over two binary trees which both span the entire network. By careful layout and communication scheduling, each tree communicates as efficiently as a single tree with exclusive use of the network. Our algorithms thus achieve u...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2006